Nebius

AE Quick Reference · Confidential

Portfolio by Rus Teston

Account Executive · Technical Cheat Sheet

Selling Nebius AI Cloud
Everything you need in one page.

GPU tiers, workload signals, discovery questions, objection responses, competitive one-liners, and proof points — so you walk into every call prepared.

The One-Sentence Pitch

Nebius is the purpose-built AI cloud — bare-metal NVIDIA GPU performance, available in hours, at the lowest total cost of ownership in the market, with dedicated Solution Architect support from day one.

NVIDIA Reference Platform Partner Nasdaq listed $700M raised SemiAnalysis Gold Medal TCO HIPAA · SOC 2 · GDPR · ISO 27001

Product Portfolio — What You're Selling

AI Cloud Core

GPU clusters + managed infrastructure for training, fine-tuning, and large-scale inference. Kubernetes, Slurm, storage, observability — all included.

Sell when customer says...

training a model need a GPU cluster running workloads at scale replacing AWS/GCP

Token Factory API

Production-ready model inference API — call top open-source models (including DeepSeek R1 optimized by vLLM) with per-token pricing. No infrastructure to manage.

Sell when customer says...

just need an API building an app on top of LLMs don't want to manage GPUs per-call pricing preferred

GPU Tier Quick Reference — Match Hardware to Workload

GB300 NVL72

Flagship

Highest throughput & TCO for the most demanding AI workloads — rack-scale, liquid-cooled. Contact sales for access.

GB200 NVL72

Flagship

Rack-scale Blackwell. Heavy foundation model training + ultra-low latency reasoning inference. Contact sales.

HGX B300

Training

Next-gen accelerated computing for complex reasoning models and large-scale pre-training workloads.

HGX B200

Balanced

Blackwell air-cooled. Ideal for reasoning LLMs, multi-modal models, and agentic AI. Self-service available.

HGX H200

Extended Mem

Extended GPU memory — predictable performance for LLM + multi-modal training and inference. Self-service.

HGX H100

Cost-Effective

Cost-effective and robust for building and serving foundation models at scale. Best entry point for new customers.

        ★ All connected via NVIDIA InfiniBand / Quantum-X800 for distributed training · Hopper = H-series · Blackwell = B-series
      

Workload Decoder — What They Say → What They Need

🧠 Training

They say "Building a model from scratch" / "Pre-training on our dataset" / "Foundation model"

They need Large GPU cluster, high-bandwidth InfiniBand, fast shared storage, Kubernetes or Slurm orchestration

Pitch AI Cloud — multi-node cluster, Soperator, 1 TB/s storage throughput

GPU rec. GB300/GB200 NVL72 or H100/H200 cluster depending on scale

Key proof Recraft trained 20B param model; Photoroom scaled training seamlessly

🔧 Fine-Tuning

They say "Adapting an existing model" / "Domain-specific tuning" / "LoRA / QLoRA"

They need Smaller GPU footprint, managed MLflow for experiment tracking, fast iteration cycles

Pitch AI Cloud — on-demand GPU instances + Managed MLflow + PostgreSQL for metadata

GPU rec. H100 or B200 on-demand; reserve when cycles are predictable

Key proof Wubble — QLoRA adaptation + 1.8s time-to-first-token; Simulacra AI 90% faster compile

⚡ Inference

They say "Serving a model in production" / "Low latency responses" / "Scaling an AI API"

They need High throughput, low latency, autoscaling endpoints, cost-per-token efficiency

Pitch Token Factory API (managed) or AI Cloud + vLLM (self-managed). Depends on control needs.

GPU rec. H200 / B200 for large models; Token Factory if they want zero infra management

Key proof Brave Search: 11M+ AI answers/day, ~100% GPU utilization on Nebius

6 Discovery Questions That Qualify Any AI Infrastructure Deal

"What AI workloads are you running today — or planning to run in the next 6 months?"

→ Identifies training vs fine-tuning vs inference; reveals scale and urgency

"What GPU infrastructure are you using right now, and what's frustrating you about it?"

→ Uncovers incumbent (AWS, GCP, on-prem) and the displacement opportunity

"How quickly do you need GPU capacity available — and what happens if there's a delay?"

→ Availability speed is a Nebius differentiator (hours vs weeks); quantify the cost of waiting

"What does your team look like on the infrastructure side — do you have DevOps managing this, or do your ML engineers handle it directly?"

→ Positions managed services and SA support; identifies complexity appetite

"Are there any compliance or data residency requirements we need to factor in — HIPAA, GDPR, EU-only?"

→ Surfaces regulated industry angle; Nebius has EU DCs (Finland, France, Iceland) + full compliance stack

"Walk me through your current GPU spend — what's in budget, and what would it take to justify a switch?"

→ Opens TCO conversation; connects to SemiAnalysis study as a third-party anchor

Top Objection Responses

Competitive "We're already on AWS / GCP — why would we switch?"

Don't fight it — add to it. "Most of our customers start exactly there. The question is whether AWS/GCP is optimized for your AI workloads specifically — or whether you're paying hyperscaler tax for infrastructure that wasn't built for ML. SemiAnalysis modeled three real AI workloads and Nebius delivered the lowest TCO across all three. We can run that model against your actual workload in a 30-minute session with one of our Solution Architects."

Commercial "Your pricing seems higher / I need to see a TCO comparison."

Lead with the SemiAnalysis study. "Fair — and we commissioned SemiAnalysis to model exactly this. Across LLM pre-training, multimodal RL research, and production inference, Nebius had the lowest TCO of any provider modeled. The difference is we maximize GPU utilization — bare-metal performance, no hypervisor overhead. We can build your specific workload into the model."

Trust "We've never heard of Nebius — are you a stable company?"

Three anchors: listing, funding, NVIDIA. "Nebius is publicly listed on Nasdaq, so our financials are fully transparent. We raised $700M led by NVIDIA and Accel — NVIDIA doesn't make that investment in a company they don't believe in. We're also an NVIDIA Reference Platform Cloud Partner — a designation held by very few providers globally. And our ISEG supercomputer ranked #19 in the world."

Technical "We're worried about vendor lock-in with a smaller provider."

Openness is a feature, not a compromise. "We're built on open standards — Terraform, Kubernetes, Slurm, standard NVIDIA CUDA. Your workloads run the same way they'd run anywhere else. The Nebius Solution Library on GitHub has all our Terraform recipes publicly available. You're never locked into a proprietary runtime or toolchain."

Trust "We need HIPAA / GDPR compliance — can you support that?"

Full compliance stack, EU data centers. "Yes — Nebius is HIPAA-, SOC 2-, GDPR-, and ISO 27001-compliant with privacy-by-default architecture and tenant-level isolation. For EU data residency specifically, we have data centers in Finland, France, and Iceland. Visit our Trust Center for the full documentation — it's built to answer security questionnaires directly."

Competitive One-Liners

vs AWS

AWS is built for general cloud. Nebius is built for AI. No hypervisor tax, no GPU availability queues, no DevOps overhead — just bare-metal performance available in hours, not weeks.

vs Google Cloud

GCP locks you into Google's proprietary TPU ecosystem. Nebius gives you the latest NVIDIA hardware on open-standard tooling — Kubernetes, Slurm, CUDA — no proprietary runtime lock-in.

vs Azure

Azure is optimized for Microsoft's enterprise stack, not ML-first workloads. Nebius maximizes Model FLOPS Utilization — performance on par with leading industry benchmarks at lower TCO.

vs CoreWeave

CoreWeave is GPU-only. Nebius is a full-stack AI cloud — compute plus managed MLOps (MLflow, Spark, PostgreSQL), storage, orchestration, and dedicated SA support included.

Proof Points — Drop These in Deals

11M+

AI-generated answers delivered daily by Brave Search on Nebius — at ~100% GPU utilization

Customer: Brave Search · Workload: Inference

Lower costs compared to major providers, achieved by CentML on Nebius AI Cloud

Customer: CentML · Workload: Inference platform

20B

Model parameters trained by Recraft from scratch — comparable to DALL·E 3 with 49% preference on benchmarks

Customer: Recraft · Workload: GenAI training

90%

Faster model compilation for Simulacra AI — from 2+ hours to 10–20 minutes using Nebius H100/H200 fleet

Customer: Simulacra AI · Workload: Research training

#19

World ranking of ISEG — Nebius's own supercomputer built in Finland, demonstrating in-house infrastructure credibility

Source: TOP500 Supercomputer List

🥇

SemiAnalysis Gold Medal in GPU Cloud ClusterMAX™ Rating — lowest TCO across all three modeled AI workloads

Source: SemiAnalysis ClusterMAX study

When to Bring in Your SA

🟢

Customer asks about multi-node cluster architecture

SA needed → involves InfiniBand topology, Kubernetes/Slurm config, fault tolerance design

🟢

Technical evaluation / POC requested

SA needed → handles environment setup, benchmark design, and results validation

🟢

Customer mentions GPU count > 8 nodes or 1000+ GPUs

SA + Sales Eng needed → large-scale cluster design and dedicated onboarding required

🟢

Security review or compliance questionnaire

SA + Trust Center docs → SOC 2, HIPAA, GDPR documentation provided by SA

🟢

Customer asks about storage throughput, WEKA, or VAST Data

SA needed → storage architecture is technical; AE sets the stage, SA closes the conversation

🟡

Customer asks "what GPU should I use?"

Use this cheat sheet first — if workload is unusual or scale is large, then loop in SA

🟡

Token Factory API questions

AE can handle pricing + basic use cases; SA only if custom integration or SLA requirements arise

Selling Nebius AI CloudEverything you need in one page.

Selling Nebius AI Cloud
Everything you need in one page.